A Bloom filter based semi-index on q-grams

نویسندگان

  • Szymon Grabowski
  • Robert Susik
  • Marcin Raniszewski
چکیده

We present a simple q-gram based semi-index, which allows to look for a pattern typically only in a small fraction of text blocks. Several space-time tradeoffs are presented. Experiments on Pizza & Chili datasets show that our solution is up to three orders of magnitude faster than the Claude et al. [4] semi-index at a comparable space usage.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Automated Methods for Estimating Baseflow from Streamflow Records in a Semi Arid Watershed

Understanding of the runoff generation processes is important in understanding the magnitude and dynamics ofgroundwater discharge. However, these processes continue to be difficult to quantify and conceptualize. In this study,two digital filter based separation modules, the Recursive filtering method (RDF) and a generalization of therecursive digital filter (GRDF) were1991–2002 in the Hableh Ro...

متن کامل

Anagram: A Content Anomaly Detector Resistant to Mimicry Attack

In this paper, we present Anagram, a content anomaly detector that models a mixture of high-order n-grams (n > 1) designed to detect anomalous and “suspicious” network packet payloads. By using higher-order n-grams, Anagram can detect significant anomalous byte sequences and generate robust signatures of validated malicious packet content. The Anagram content models are implemented using highly...

متن کامل

Private record linkage with Bloom filters

In many record linkage applications, identifiers have to be encrypted to preserve privacy. Therefore, a method for approximate string comparison in private record linkage is needed. We describe a new method of approximate string comparison in private record linkage. The main idea is to store q-grams sets derived from identifier values in Bloom filters and compare them bitwise across databases. ...

متن کامل

Anagram: A Content Anomaly Detector Resistant to Mimicry Attack1

In this paper, we present Anagram, a content anomaly detector that models a mixture of high-order n-grams (n > 1) designed to detect anomalous and “suspicious” network packet payloads. By using higher-order n-grams, Anagram can detect significant anomalous byte sequences and generate robust signatures of validated malicious packet content. The Anagram content models are implemented using highly...

متن کامل

Privacy-preserving record linkage using Bloom filters

BACKGROUND Combining multiple databases with disjunctive or additional information on the same person is occurring increasingly throughout research. If unique identification numbers for these individuals are not available, probabilistic record linkage is used for the identification of matching record pairs. In many applications, identifiers have to be encrypted due to privacy concerns. METHOD...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • Softw., Pract. Exper.

دوره 47  شماره 

صفحات  -

تاریخ انتشار 2017